Supplementary Material : A Richly Annotated Catalog of Surface Appearance
ثبت نشده
چکیده
In this supplementary material, we provide further details for each of our Amazon Mechanical Turk (MTurk) tasks, and on our implementation. The data from all of our experiments is fully open and available on-line at http://opensurfaces.cs.cornell.edu/. Statistics about the experiments and the results are also hosted online, and they will be continually updated as the database grows. Task descriptions include anonymized user feedback. Such feedback is quoted with the original spelling and capitalization. 2 Downloading images Images were downloaded from Flickr. They were selected according to several criteria. A complete description of the selection process is below. • Queries were constructed as a scene category + tag + "-hdr ". – "-hdr " : we avoided photos that use HDR tonemapping since we found that such images often aimed for artistic effects rather than realistic appearance. • The extra tag is discarded and photos are grouped by scene. Duplicates are detected by md5 hash of the file (to facilitate running scripts multiple times). The extra tags are used only to encourage nicer photos; we found that these tags resulted in images that were less cluttered and aimed more at showing off the space, rather than showing people or other activities. • We limited ourselves to Creative Commons photos that allow " sharing " and " remixing ". • Automatic filtering of images: each image must satisfy the following conditions, – JPEG format – ≥ 6 megapixel resolution – ≤ 32 megabyte file size – at least one pixel has color (defined as minimum difference between RGB channels ≥ 10) – focal length specified in an EXIF header (obtained with the jhead utility program) – the camera model exists in a camera database. Figure 1: Interface for filtering images that do not match their scene label. Instructions. Users are instructed to click on pictures that match a specific category label, while excluding: • Rotated photos (an overhead view is acceptable) • Black-and-white or sepia-tone photos • Photos with special effects or superimposed text • Very blurry photos • Rooms under construction • Toy models of a scene • Naked people or sexual poses • Pictures of mostly people, where the scene is not clearly visible. Interface. Users are presented with a grid of 50 images, each annotated with the Flickr search query (e.g. " living room " , " kitchen " , etc.). The photos are grouped by category, and then …
منابع مشابه
Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora
The availability of large multi-parallel corpora offers an enormous wealth of material to contrastive corpus linguists, translators and language learners, if we can exploit the data properly. Necessary preparation steps include sentence and word alignment across multiple languages. Additionally, linguistic annotation such as part-of-speech tagging, lemmatisation, chunking, and dependency parsin...
متن کاملDurable Glass Fiber Reinforced Concrete with Supplimentary Cementitious Materials
Durability of concrete structure in marine environments is a big issue for many decades due to chloride attack. Chloride penetrates the concrete structure and accelerates the corrosion process of reinforcement which decreases the life of those structures. Also shrinkage cracks in concrete play main role for chloride penetration through concrete surface. Many researchers tried to find easy and ...
متن کاملSlate - A Tool for Creating and Maintaining Annotated Corpora
Recent research trends of the last five years show that richly annotated corpora inspire novel research. These richly annotated corpora are indispensable for progressing research, but also more difficult to manage and maintain due to increasing complexity – what is needed is a way to manage the annotation project in its entirety. However, annotation project management has received little attent...
متن کاملOptimizing Rule-Based Morphosyntactic Analysis of Richly Inflected Languages - a Polish Example
We consider finite-state optimization of morphosyntactic analysis of richly and ambiguously annotated corpora. We propose a general algorithm which, despite being surprisingly simple, proved to be effective in several applications for rulesets which do not match frequently.
متن کامل